The purpose of this circuit is to allow a sound card (a SoundBlaster AWE 64 in my case) to interface with the phone line so that output from the sound card that normally goes to the speakers goes instead to the phone and can be heard by a remote caller or local phone, and also speech on the phone line will be sent to the sound card via the sound card's microphone interface. Thus we can achieve a simple answering machine: Detect a ring, answer the phone (using modem hardware and AT commands) and then we can play a sound file from the hard disk of the computer out the sound card to the phone: "Please leave a message". Then we can record the caller's message onto the hard disk for later play back or even inclusion in an email. With a little more sophisticated software, we can analyze the input to the sound card from the phone in real time using FFT (fast fourier transform) spectral analysis and detect touchtones entered by the caller. This opens us up to more elaborate voicemail or IVR (interactive voice response) capabilities, giving us a few of the features of special purpose cards made by folks like Dialogic and Natural Microsystems (albeit only on one channel per sound card). Here is the circuit:
Note that there are so called "voice modems" that contain on-board circuitry to do voice capture and playback and touchtone recognition via AT command over the serial line. These are good for what they do. But I didn't have one, and I did have lots of old 2400, 9600, and 28.8 modems lying around that I thought I could put to use. So that is why I designed this circuit. There are a few other technical advantages of the circuit approach vs. voice modems as well. For example, it would be an interesting project (and quite possible) to implement voice recognition in software on the PC side since the voice arrives in real time. Bear in mind that you are dealing with a 30-3KHz bandwidth limit on the phone input though, which makes accurate voice recognition a difficult (but doable) prospect.
With that out of the way, here is a description of the circuit.
For the output stage (sound card->phone) I found that it was possible and sounds OK to simply capacitively couple the sound card speaker output to the phone line. It doesn't get much simpler than that! The capacitors block the 48V DC component from the phone line, but allow the AC component of the sound output to add AC current to the phone line. This is exactly what we want to do--add our additional current to the off hook current supplied by the modem. The modem is acting as a current source, and isn't pinning the voltage on the line, so we are able to add an AC current no problem. Note that the caps are wired in parallel with the modem. Therefore we can add an additional AC current to the phone line without interfering with the modem's own current (the modem draws current when it goes off hook).
NOTE: This circuit won't work without a modem on the line in the place as indicated. The modem must draw the "off hook" current to tell the Central Office switch that we are off hook. Of course, we will control the modem's on hook/off hook state via AT commands in our answering machine script.
NOTE: This circuit (probably) must sit *in front* of your modem, not behind it (in other words, it can't be fed from the modem's PHONE port). [Update 5/13/2001: Glenn Caughey kindly points out that when the modem goes off hook, it intentionally disconnects the phone jack output on the modem so that people picking up that phone at least can't interfere with the modem's communication on the line.]
The input stage was a bit tougher. Based on a schematic I saw on the web (see sources, below) I tried doing a simple capacitive coupling from the phone line to the Mic. This worked horribly, and I'm not sure why. There was a very loud buzzing humm in the background that overwhelmed the microphone input. Perhaps the microphone was expecting a grounded input and not a differential input? Not sure. One source seemed to call this effect "motorboating" and said it could probably be eliminated by switching tip and ring. Didn't work for me. [Update 5/13/2001: Glenn Caughey points out that the sound card is expecting an inductive load, not a capacitive load, and giving it a capacitive load is what causes the motorboating effect. The transformer gives the desired inductive load.] Next I tried a simple audio isolation transformer in conjunction with a simple RC network. The cap blocks any DC current flow through the transformer, and the resistor . . . well, I'm not sure what the resistor does. I just shorted it out and it sounds even better without it. Anyone with a suggestion of what it might have been for in the circuit I cribbed from, please email me.
Here are some construction tips: the whole device can be fit into one of those little phone jack boxes that Radio Shack or Fry's sells. This makes a handy little connector case. If you get an 8 wire phone jack box you can even use the spare screw jacks in the box as extra connection points for your capacitors and xformer and wires and you won't even have to solder anything. Buy or scrounge a mic extension cable and a speaker cable appropriate for your sound card (you may be able to buy one cable and cut it in two and use the jack on one end of each cable for the sound card mic & speaker out, and splice the wires from the other ends into your circuit box). You can use the phone plug on the box for the incoming phone line (from the wall jack) and then splice in an old phone cable for the output to the modem. Voila!
Note: I am not totally happy with the output stage (sound card->phone) as shown above. There is a little too much background noise for my liking, even though it works satisfactorily as shown. Any suggestions for improvements are welcome.
Here are some links that I found helpful in the design of this circuit:
http://www.hut.fi/Misc/Electronics/circuits/teleinterface.html
http://massis.lcs.mit.edu/telecom-archives/archives/technical/phone.patches
http://www.egyed.com/phonework.html
BTW my friend Brandyn (www.sifter.org/~brandyn) constructed something that achieves a similar result as this circuit, but his approach does not involve any external circuitry! He tore apart an old modem and reverse engineered it to the extent that he found where the mic input and speaker output of his sound card could plug directly into appropriate interface points on the modem's circuit board. Brandyn is what you would call a "do-it-your-selfer". The details about where on the modem circuit board to make the connection obviously will vary from one make of modem to another, and it is possible that recent modems are so highly integrated that you cannot easily find an interface point at all. I didn't take this approach because I wanted hardware independence. My circuit does not depend on the details of any modem's circuitry, though Brandyn's approach is a pretty neat hack.
There are two main components to the software system as I implemented it.
First, there is an Expect script called 'ans' that controls the call-flow logic. By this I mean detecting when a call comes in, answering the phone via ATO to the modem, playing a pre-recorded voice prompt to the caller, and so on. For those who don't know, Expect is a language (really an extension to Ousterhout's TCL) that makes it easy to control other programs via dialogs where you expect some output, send a string, expect some more output send another string etc. (Rather like a login script in the olden days.) In this case, the program being controlled by the Expect script is "cu", which is a bare bones tool for talking to a modem. So the Expect script invokes cu and says, "wait for RING", then when you get the ring send "ATO" to go offhook etc.
The Expect script also invoke other shell commands, which brings us to a special purpose C program that I wrote called 'recplay', which is the second major component in my implementation. 'recplay' can do many tasks depending on what command line arguments it is given. It can play a pre-recorded file that contains recorded human speech. The output goes out the sound card's speaker (and thus, via the circuit, is played out onto the phone!) 'recplay' can also record a new sound file from the sound card's mic (and hence, via the circuit, the phone) input. 'recplay' can recognize touch tones and return a text output message with the character digits of any such tones gathered. This text output can be gathered by the Expect script, and used to control the call flow logic ("press 1 to repeat, press 2 to save, press 3 to delete").
'recplay' makes extensive use of a canned FFT (fast fourier transform) I cribbed from some book (probably Derenzo's "Interfacing" book from my Berkeley days). Explaining what a FFT does is a little beyond the scope of this article but essentially an FFT tells you what frequencies comprise an incoming signal. When a person presses a touchtone key, you get two very strong frequencies (a distinct pair for each number or symbol on the phone's keypad) and the FFT can easily recognize these frequencies and figure out what button was pushed. Understanding how the FFT actually does this magic requires a bit more work, but it is one of the more beautiful bits of mathematics you're likely to run across!
Note that your voice is also composed of many different frequency components. By writing a program to analyze these frequency components over time you can actually figure out the speech contained therein. If you are very clever.
Again let me say that this software is very alpha and will most likely require a bit of massaging to work right for you. However, I have tested it in a simple voicemail application, with an 'administrative back end' for listening to caller messages under touchtone control, and it works for me.
The software is available here: answeringmachine1.0-linux-src.tgz. If your browser is not configured to recognize .tgz as data, you may have better luck if you shift-click on the link rather than just clicking on the link.
One thing you will quickly discover is that there is a range of sound card mixer settings for which the circuit works best. You will have to experiment with the mixer tool to find out what these settings are for your environment. If the mic gain is too low, you won't be able to record the caller's voice. Too loud, and you will overdrive the sound card input. If your speaker output gain is too low, the remote party won't be able to hear what your sound card says. And so on. It is best to save the mixer settings you find optimal and have your answering machine restore them at startup so that your machine will function correctly if it reboots and loses the mixer settings. Note that the program resets the mixer settings to values that I found worked in my environment. You must make sure that the 'setmixer' script actually uses a valid mixer tool for your OS release (I use the aumix tool that comes with Redhat Linux).
One tool I have found valuable for a couple of purposes is the waveform analysis tool "SoundStudio". Be sure to check out this program from your favorite rpm archive.
2) Voice recognition. There are several open source voice recognition tools out there that could be interfaced to the phone using the circuit above. Or, figure it out and write your own.